Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make flash attention configurable #60

Merged
merged 94 commits into from
Jan 30, 2025

Conversation

theissenhelen
Copy link
Collaborator

@theissenhelen theissenhelen commented Jan 6, 2025

Current setup:

  • If flash-attn is available in the environment, the MultiHeadSelfAttention module automatically imports the corresponding attention function. In inference however we do not have that information.

Now:

  • user specifies whether flash-attn, flex attention or scaled dot product attention should be used in the model config.
  • adds configurable parameters (soft cap, aLiBi) for flash attention
    for aLiB:i adds a function to compute the slopes according to the number of attention heads
  • scaled dot product attention now supports sliding window (making it numerically equivalent to flash/flex)

Todo:

  • test various attention options
  • adjust test case coverage

theissenhelen and others added 30 commits September 27, 2024 08:42
* fix: change pre-cmmit autoupdate schedule to monthly

* fix: change the merge strategy for Changelog to Union

* fix: add .envrc to .gitignore

* ci: ignore pre-commit-config and readthedocs for changelog updates

* ci: fix to correct hpc workflow call

* fix: update precommit config

* chore: update pre-commits

* feat: add codeowners file

* chore: update dependencies

* ci: add hpc-config

* docs: changelog

* fix: respond to review comments

---------

Co-authored-by: Jesper Dramsch <[email protected]>
* feat: add configurability to dropout in MultiHeadSelfAttention

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>

* test: adjust to dropout_p

* doc: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* [pre-commit.ci] pre-commit autoupdate (#25)

updates:
- [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0)
- [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2)
- [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

---------

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>
Co-authored-by: Gert Mertes <[email protected]>
Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
xfail for MultiHeadSelfAttention
* fix: change pre-cmmit autoupdate schedule to monthly

* fix: change the merge strategy for Changelog to Union

* fix: add .envrc to .gitignore

* ci: ignore pre-commit-config and readthedocs for changelog updates

* ci: fix to correct hpc workflow call

* fix: update precommit config

* chore: update pre-commits

* feat: add codeowners file

* chore: update dependencies

* ci: add hpc-config

* docs: changelog

* fix: respond to review comments

---------

Co-authored-by: Jesper Dramsch <[email protected]>
* feat: add configurability to dropout in MultiHeadSelfAttention

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>

* test: adjust to dropout_p

* doc: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* [pre-commit.ci] pre-commit autoupdate (#25)

updates:
- [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0)
- [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2)
- [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

---------

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>
Co-authored-by: Gert Mertes <[email protected]>
Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
xfail for MultiHeadSelfAttention
mishooax
mishooax previously approved these changes Jan 20, 2025
Copy link
Member

@mishooax mishooax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK to me (I haven't run the code, but Cathal did) - nice work @theissenhelen.

Copy link
Member

@HCookie HCookie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant work exists here,
Just a few thoughts and questions,
Looks great overall.

models/src/anemoi/models/layers/attention.py Outdated Show resolved Hide resolved
models/src/anemoi/models/layers/attention.py Outdated Show resolved Hide resolved
models/src/anemoi/models/layers/attention.py Outdated Show resolved Hide resolved
models/src/anemoi/models/layers/attention.py Show resolved Hide resolved
Copy link
Contributor

@anaprietonem anaprietonem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested the PR with flash_attention and scaled_dot_product_attention and it's working fine! previous comments were also addressed.
`

@HCookie HCookie changed the title feat: 44 make flash attention configurable feat: make flash attention configurable Jan 30, 2025
@anaprietonem anaprietonem merged commit 41fcab6 into main Jan 30, 2025
28 checks passed
@HCookie HCookie deleted the feature/44-make-flash-attention-configurable branch January 30, 2025 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants